154 research outputs found
GRAPE for fast and scalable graph processing and random-walk-based embedding
Graph representation learning methods opened new avenues for addressing complex, real-world problems represented by graphs. However, many graphs used in these applications comprise millions of nodes and billions of edges and are beyond the capabilities of current methods and software implementations. We present GRAPE (Graph Representation Learning, Prediction and Evaluation), a software resource for graph processing and embedding that is able to scale with big graphs by using specialized and smart data structures, algorithms, and a fast parallel implementation of random-walk-based methods. Compared with state-of-the-art software resources, GRAPE shows an improvement of orders of magnitude in empirical space and time complexity, as well as competitive edge- and node-label prediction performance. GRAPE comprises approximately 1.7 million well-documented lines of Python and Rust code and provides 69 node-embedding methods, 25 inference models, a collection of efficient graph-processing utilities, and over 80,000 graphs from the literature and other sources. Standardized interfaces allow a seamless integration of third-party libraries, while ready-to-use and modular pipelines permit an easy-to-use evaluation of graph-representation-learning methods, therefore also positioning GRAPE as a software resource that performs a fair comparison between methods and libraries for graph processing and embedding.National Center for Gene Therapy and Drugs based on RNA Technology, PNRR-NextGenerationEU program G43C22001320007United States Department of Health & Human Services
National Institutes of Health (NIH) - USA
NIH National Cancer Institute (NCI) U01-CA239108-02Transition Grant Line 1A Project NIMI PARTENARIATI H2020' 1R24OD011883-01United States Department of Health & Human Services
National Institutes of Health (NIH) - USA U01-CA239108-02
DE-AC02-05CH11231United States Department of Energy (DOE)European Union (EU)
Marie Curie Actions
PSR2015-1720GVALE_01
PID2021-128970OA-I0
Het-node2vec: second order random walk sampling for heterogeneous multigraphs embedding
We introduce a set of algorithms (Het-node2vec) that extend the original
node2vec node-neighborhood sampling method to heterogeneous multigraphs, i.e.
networks characterized by multiple types of nodes and edges. The resulting
random walk samples capture both the structural characteristics of the graph
and the semantics of the different types of nodes and edges. The proposed
algorithms can focus their attention on specific node or edge types, allowing
accurate representations also for underrepresented types of nodes/edges that
are of interest for the prediction problem under investigation. These rich and
well-focused representations can boost unsupervised and supervised learning on
heterogeneous graphs.Comment: 20 pages, 5 figure
parSMURF, a high-performance computing tool for the genome-wide detection of pathogenic variants.
BACKGROUND: Several prediction problems in computational biology and genomic medicine are characterized by both big data as well as a high imbalance between examples to be learned, whereby positive examples can represent a tiny minority with respect to negative examples. For instance, deleterious or pathogenic variants are overwhelmed by the sea of neutral variants in the non-coding regions of the genome: thus, the prediction of deleterious variants is a challenging, highly imbalanced classification problem, and classical prediction tools fail to detect the rare pathogenic examples among the huge amount of neutral variants or undergo severe restrictions in managing big genomic data.
RESULTS: To overcome these limitations we propose parSMURF, a method that adopts a hyper-ensemble approach and oversampling and undersampling techniques to deal with imbalanced data, and parallel computational techniques to both manage big genomic data and substantially speed up the computation. The synergy between Bayesian optimization techniques and the parallel nature of parSMURF enables efficient and user-friendly automatic tuning of the hyper-parameters of the algorithm, and allows specific learning problems in genomic medicine to be easily fit. Moreover, by using MPI parallel and machine learning ensemble techniques, parSMURF can manage big data by partitioning them across the nodes of a high-performance computing cluster. Results with synthetic data and with single-nucleotide variants associated with Mendelian diseases and with genome-wide association study hits in the non-coding regions of the human genome, involhing millions of examples, show that parSMURF achieves state-of-the-art results and an 80-fold speed-up with respect to the sequential version.
CONCLUSIONS: parSMURF is a parallel machine learning tool that can be trained to learn different genomic problems, and its multiple levels of parallelization and high scalability allow us to efficiently fit problems characterized by big and imbalanced genomic data. The C++ OpenMP multi-core version tailored to a single workstation and the C++ MPI/OpenMP hybrid multi-core and multi-node parSMURF version tailored to a High Performance Computing cluster are both available at https://github.com/AnacletoLAB/parSMURF
GraPE: fast and scalable Graph Processing and Embedding
Graph Representation Learning methods have enabled a wide range of learning
problems to be addressed for data that can be represented in graph form.
Nevertheless, several real world problems in economy, biology, medicine and
other fields raised relevant scaling problems with existing methods and their
software implementation, due to the size of real world graphs characterized by
millions of nodes and billions of edges. We present GraPE, a software resource
for graph processing and random walk based embedding, that can scale with large
and high-degree graphs and significantly speed up-computation. GraPE comprises
specialized data structures, algorithms, and a fast parallel implementation
that displays everal orders of magnitude improvement in empirical space and
time complexity compared to state of the art software resources, with a
corresponding boost in the performance of machine learning methods for edge and
node label prediction and for the unsupervised analysis of graphs.GraPE is
designed to run on laptop and desktop computers, as well as on high performance
computing cluster
GRAPE for fast and scalable graph processing and random-walk-based embedding
Graph representation learning methods opened new avenues for addressing complex, real-world problems represented by graphs. However, many graphs used in these applications comprise millions of nodes and billions of edges and are beyond the capabilities of current methods and software implementations. We present GRAPE (Graph Representation Learning, Prediction and Evaluation), a software resource for graph processing and embedding that is able to scale with big graphs by using specialized and smart data structures, algorithms, and a fast parallel implementation of random-walk-based methods. Compared with state-of-the-art software resources, GRAPE shows an improvement of orders of magnitude in empirical space and time complexity, as well as competitive edge- and node-label prediction performance. GRAPE comprises approximately
1.7 million well-documented lines of Python and Rust code and provides 69 node-embedding methods, 25 inference models, a collection of efficient graph-processing utilities, and over 80,000 graphs from the literature and other sources. Standardized interfaces allow a seamless integration of third- party libraries, while ready-to-use and modular pipelines permit an easy-to- use evaluation of graph-representation-learning methods, therefore also positioning GRAPE as a software resource that performs a fair comparison between methods and libraries for graph processing and embedding
Metronomic Oral Vinorelbine: An Alternative Schedule in Elderly and Patients PS2 With Local/Advanced and Metastatic NSCLC Not Oncogene-addicted
The MILES and ELVIS studies showed that vinorelbine is one of the best options for elderly patients with advanced non-small-cell-lung cancer (NSCLC). Oral vinorelbine at standard schedule (60-80 mg/m2/weekly) has good activity in terms of response rates and progression-free survival. In recent years, a metronomic schedule of oral vinorelbine (40-50 mg/m2 three times a week, continuously) has been studied in phase II trials, especially in unfit and elderly patients. In the MOVE trial metronomic oral vinorelbine had a clinical benefit [partial response (PR)+stable disease (SD) >12 weeks] in 58.1% of patients with mild toxicity. On this basis, in 2017 we started a phase II study with metronomic oral vinorelbine in elderly (over 70 years) or unfit [Eastern Cooperative Oncology Group performance score (ECOG-PS) of 2] patients with locally/advanced and metastatic NSCLC. Primary aims were clinical benefit (PR+SD ≥6 months) and toxicity; secondary aims were progression-free survival and overall survival
Volatile lipophilic substances management in case of fatal sniffing.
Death due to inhalation of aliphatic hydrocarbons such as butane and propane is a particularly serious problem worldwide, resulting in several fatal cases of sniffing these volatile substances in order to "get high". Despite the number of cases published, there is not a unique approach to case management of fatal sniffing. In this paper we illustrate the volatile lipophilic substances management in a case of a prisoner died after sniffing a butane-propane gas mixture from prefilled camping stove gas canisters, discussing the comprehensive approach of the crime scene, the autopsy, histology and toxicology. A large set of accurate values of both butane and propane was obtained by gas chromatography-mass spectrometry analyzing the following post-mortem biological samples: peripheral blood, heart blood, vitreous humor, liver, lung, heart, brain/cerebral cortex, fat tissue, kidney, and allowed an in depth discussion about the cause of death. A key role is played by following the proper sampling approach during autopsy
Supervised learning with word embeddings derived from PubMed captures latent knowledge about protein kinases and cancer.
Inhibiting protein kinases (PKs) that cause cancers has been an important topic in cancer therapy for years. So far, almost 8% of \u3e530 PKs have been targeted by FDA-approved medications, and around 150 protein kinase inhibitors (PKIs) have been tested in clinical trials. We present an approach based on natural language processing and machine learning to investigate the relations between PKs and cancers, predicting PKs whose inhibition would be efficacious to treat a certain cancer. Our approach represents PKs and cancers as semantically meaningful 100-dimensional vectors based on word and concept neighborhoods in PubMed abstracts. We use information about phase I-IV trials in ClinicalTrials.gov to construct a training set for random forest classification. Our results with historical data show that associations between PKs and specific cancers can be predicted years in advance with good accuracy. Our tool can be used to predict the relevance of inhibiting PKs for specific cancers and to support the design of well-focused clinical trials to discover novel PKIs for cancer therapy
- …